DOCORBIT BSS08B 



ID 177 197 



tB 009 690 



TITLE 



PUB DATE 
CONTRACT 
NOTE 



seas PRICE 
DESGRIPTOB;^ 



identifil;ks 



ABSTRACT 



Operational Character ist ics ot a Easch Model Tdilored 
Testing Procedure irfaen Prograa Paraa^ters and iteo 
Pool Attrifcut€s ar€ Varied. 
Hissouri Dniv., Coluafcia* 

Office of Naval Research, Washin<jt<)E, D.C* Personnel 
and Training Branch, 
Apr 79 

NO0OH4-77*C-O097 

J8p. ; Papei presented at the Annual Meeting ot thti 
.National Council ct fiea sarefcent m Education (San 
Fraiicasco, California, April 9-11, 1979) 

KF01/PC02 Plus Postage. 

*Comp\iter Prograns; Educational Testing; ♦Item 
Analysis; Itea Banks; *Mathemat ical Hcdels; 
Simulation; Standard Error of HeasuieiBent; 
♦Statistical Bias; .Test Constructicc; Test Items; 
♦Test Reliability 

Computer Assisted Testing; Latent Itaat Kcdel; one 
Parameter Hbdei; *Tailor«^d Testing 



Simulated tailored t'3£ts were u::ed tc investigate tiie 

relatior.i^hit-s between c.iarac t€ristics of che iteo pool <inc the 

co»put»3i i>royraiii, and the reliability and bias ct the resulting 

ability estiaates. The coaiputer program was varied to provide for 

various titep sizes (differences in difficulty betwetn successive 

steps) and different acceptance ranges (precision with which selected 

iteas must natch desired 5.tem difficulties). Item analysis and 

ability estinates wert- based cn a one-paraueter Icgistic (Basch) 

■odel. it^B pools varied in size from 9 to 1B1 items, and in shape to 

include normal, bimodal, rectangular, and pos timely skewed 

distributions of it«B difficulty* In addition ^ item difficulty 

distributions were designed to fit either precise thecreticai 

distributions or actual test distributions. The acceptance range 

pciraueter was impoitant because it influenced the pciut at which 

testing was discontinued; there was a filed maxim urn of 20 iteas but a 

minimum in some circumstances of i» items- Two different tailored test 

programs were used, TRBEIP and SIH1P. Results were hased on 2S 

simulated administrations under each cf the various conditions. 
(CTH) 



♦ 8tepiodi»ct*ioa5 supplied by EDES are the best that can oe sade ♦ 

* , . . from the original document. ♦ 



, — I 

UJ 



o 



lOUCATlON 4 Wti^AIIC 
EDUCATIOM 

w S OOC V N t MA !, B£ f N « t ^^O 
OUCf O.-f MACll V A^ ftf^dv^D » «0M 

uperational Characteristics of a Rasch Model *itw<*ii po^wt^o^ vi€wo»ofiNio*i5. 
laxxorea Testing Froceaure when Progreun UNtoiMf NAtioNf^^ »N%t.tu?roi 

Parameters and Item Pool Attributes are Varied ^^'^^-'^'^ — 
Wayne M. Patience 
Mark D. Reckase 
University of Missouri-Columbia 

INTRODUCTION 

Tailored testing, the selection and scoring of test items administered 
m an interactive fashion to individual examinees has, within- the past 
decade, become the spearhead for application of latent trait models to 
achievement and ability measurement. The availability of in«>roNfed con^>uter 
technology has, contributed greatly to the increase in the number of systems 
presently in operation which administer tailored or adaptive tests. It 
should be noted that tailored testing as presented here is synonoroous with 
many other assigned names such as adaptive testing, response contingent 
testing, or sequential testing. While numerous articles have appeared in 
the literature which describe the one-parameter logistic model and its 
application in a tailored testing setting (see for example Reckase, 1974? 
Weiss, 1974, Patience, ,1977), little or no literature has been ^written 
discussing operational characteristics of the proc«dure when program para- 
meters and item pool attributes are varied. For this report, operational 
characteristics refer to how well the tailored testing procedure estimate* 
a given true ability when provided an item pool and values for the controlling 
program parameters. The operational characteristics, item pool attributes, 
and program parameters will be described in detail shortly. 

Althwgb no literature was found v^ich addressed the effects of varying 
program parameters, a few studies have appeared in the literature which investi- 
gated effect* of item pool characteristics on the operation of tailored testing. 
Jensema (1975), for example, has investigated the influence of item pool 
size and item characteristics on a Bayesian tailored testing procedure. 
In general, Jensema found that when items are of adequate quality, it is 
not necessarv to have very large item pools. Reckase (1976) concurred with 
Jensema m recoxmnending a rectangular distribution of item pool difficulty 
values. In this latter study, the tailored testing procedure was based 
on an empirical maximum likelihood estimation of the ability parameter 
of the single logistic (Rasch) model. Issues worthy of further investiga- 
tion have surfaced in addition to item pool attributes, such as the effects 
of program parameters on the bias and variance of ability estimation. 
Bias and variance of ability estiii.ation constitute the operationjrl- character- 
istics of the tailored testing procedure. Bias and variance of ability 
estimation and the program parameters will new be described more thoroughly. 

Several articles have appeared in the literature which discuss bias 
of tailored testing ability estimation with regard to procedural bias toward 
suogrotips of an examinee population such as minorities (see for example 



Paper presented at the Annual Meeting of the National Council nn Measurement 
in Education, San Francisco, 1979. This research was supported by contract 
number N00014-77-C-0097 ftfom the Personnel and Training Research Prograias 
of the Office of Naval Research. 



ERIC 



i'ine and Weiss, 1978). The research reported here did not address this 

typti of bxas. Rather, ability estimate bias investsj^gated for this paper 
conc^rmd tiie failure of a maximxim likelihood tailored testing procedure* 
to result in an expected value of the ability estimate equal to a 
k!iown true ability. In this sense ^ the attempt was to identify values 
fur ti\e program parameters, stepsize and acceptance range, as well as item 
pool characteristics which would provide the least statistical bias in 
oi^xUty estimation* The variance of ability estimates was tiie squared 
standard error of .tlie ability estimates for a known true ability* The 
desire was to minimize this standard error. These two dependent measures 
{ rovided the criteria for judging how well the tailored testing procedure 
estimated known abilities when the program parametors and item pool 
cS;aracteristxcs' were varitid. 

PURPOSE 

Thii primary purpose of the research described herein was to determine 
the operational characteristics of a one-parameter tailored testing proce- 
uurc when program parameters and item pool attributes were varied. The 
program parameters investigated were the stepsize and acceptance range. 
The steps ize parameter specified the magnitude of movement of the ability 
iistimate during the initial item selection phase of tailored testing. 
The acceptance range parameter determined how deviant the selected item's 
difficulty value could be from the requested item difficulty and still be 
ticceptable for administration. Items were requested by the procedure 
to match the ability estimate which was computed after each item response. 
The Item pool attributes varied were size, shape, and quality. Each of 
tiu^se variables will now be described more specifically. 

Based on tlie premise of tailored testing that when an examinee answers 
an item correctly, the item administered should be nvore difficult and vice 
vt-rsa, the stepsize program parameter initially controlled how much more 
difficult or easy was the next' item administered* The selection of items 
i.ioc<jeded utilizing a fixed stepsize until the examinee had answered items 
iioth correctly and incorrectly. After both a correct and incorrect response 
had been obtained in the response fitring, a maximum likelihood ability 
tjstimate was obtained using an iterative search for the mode ot the likeli- 
hool distribution. For a more complete description of the item selection 
and ability estimation components of this maximum likelihood tailored 
testing procedure see Patience, 1977. When an ability estimate had been 
obtained, items were selected from the pool to maximize the i^nforroation 
function (Bimbaiam, 1968). For the one-parameter model, the information 
function xs maximized when the difficulty of the selected item equals the 
ability estimate. In the past^ arbitrary values have generally been choosen 
for the steprize* One of the primary goals of this research was to eir^iri* 
cally investigate the effects of stepsize values, on the bias and standard 
error of ability estimates. In so doing, the intent was to determine the 
optimal stepi^ize value which would minimize the bias and standard error 
of ability astimates. 

The second program parameter investigated was the acceptance range* 
The acceptance range specified the amount of deviation in difficulty an 
administered item could have from the requested item difficulty and still 
be acceptable for administration. The acceptance range parameter monitored 
the appropriateness of items selected throughout the tailored test, i.e.. 



-3- 



both during item seiecti6fi^b*sed on the fixed stepsize until both correct 

ai-io incorrect responses had been obtained, and also during item selection 
to maxLmize the information function for a maximum likelihood ability 
estimate. If more than one item was within plus or minus the acceptance 
range of the desired item, the item with a difficulty value nearest the 
requested value was choosen. If no item was available from the pool within 
tlie specified acceptance range of the difficulty asked for, the tailored 
test was tenninated. The primary aim then, regarding the acceptance range, 
was to determine what value or range of values yielded the least bias and 
standard error of ability estimates. Clearly, a small value for the accep- " 
tancii range would have insured that items very near the desired item difficulty 
would be administered. On the other hand, too small an acceptance range 
value would have increased the chance of premature termination of the 
tailored test, which would have induced bias of the ability e&timate 
It should be noted that both stepsize and acceptance range interact with 
item pool attributes and, therefore, a choice of what values are optimal 
may not be made assuming independence of these controlling factors. 

Item pool attributes studied in this research included size, shape, 
and quality. Item pools used in this investigation ranged in size from 
nine to one hundred and eighty -one items. Shapes of item pool distribu- 
tions were normal, rectangular, bimodal, and skewed. Item pool quality 
n>f erred to the contrast between actual and idealized pools. Idoalized 
pools consisted of item difficulty parameter^ equally spaced from niinus 
three to plus three with equal discrimination values of one and zero guess- 
ing as assxjned by the Rasch model. Actual pools consisted of item difficulty 
values obtained from calibration runs using the Wright and Panchapakesan 
[..rogram based on the Rasch model (1969). In these pools, items were not 
equally spaced on the difficulty scale. It should be noted that, quite 
clearly, item pool attribute^ played a substantial role in the utility of 
th. tailored testing procedure. 



PROGRAMS 



Two FORTRAN programs were used for investigating effects of program 
pai-ameters and item pool attributes. The input variables for both programs 
included; a) acceptance range, b) stepsize, c) item pool size, d) item 
difficulty values for the various sizes and shapes of item pools, and 
e) the true abilities for which an estimate was to be made utilizing the 
program parameters and item pool provided. Both programs output the mean 
and standard deviation of the estimates of each true ability provided. 
These served as dependent -measures for determination of the quality of 
estimation for the specific values of the acceptance range, stepsize, 
and item pool difficulties. 

The first program, the TREEIP, was based on the concept of a propensity 
distribution* A propensity distribution in this context was defined as 
the probability distribution for observed ability estimates given a true 
ability, P(9(e) (lord and Novick, 1968). The concept of a propensity 
distribution was extended from its use in tr»ae score cheoiy to the context 
of latent trait ability estimation. The TREEIP program determined the 
propensity distribution for a given true theta, 0, analytically from the 
properties of the tailored testing model. 

Briefly, the TREEIP program operated as follows. Initially an item 
of average difficulty was administered to the simulated examinee with known 



ERIC 



\ 



true ability. Based on the probability function for the siinple loaistic 
models ^ 



mdel« 

ji(0 ^ b) 

where u is the item score (0 or 1) , b is the item difficulty parameter, 
and e xs the ability paraaeter, the probability of a correct and the probab- 
ility of an incorrect response was obtained. If the response was correct » 
the ability estimate was increased by the stepsize. If the response was 
incorrect, the ability estimate was decreased by the stepsize. Thus after 
one item was administered, two paths or branches were present on the "tres" . 
(Th« tree diagram from probability theory was en?»loyed to represent the 
propensity distribution in this study.). Based on these first possible 
.ability estimates, the closest items to each of the two estimates was 
I selected for administration with the constraint that the difficulty of 
the items must have been within plus or minus the acceptance rangfe from 
the present ability estimates, if no iten>s were available to the program 
from the item pool provided, that branch was terminated at that point. 
However, assuming items were available, there existed four possible paths 
after the second item had been , administered. As long as all correct or 
ail incorrect responses were obtained on a given p&th, the ability estimates 
continued to ba increased or decreased, respectively, by the stepsize. 
However, when. both a correct and an incorrect response were present on a 
particular path of the tree, a maximum-likelihood .-^ility estimation proce- 
dure obtained an ability estimate using an iterative search for the mode 
of the likelihood distribution. 



Insert Figure 1 about here 



To partially illustrate how the propensity distribution was deteriained 
by the Tl^EElP, Figure 1 shows a diagram' representing the operation of 
the procedure on a nine item rectangular pool. The stepsize used for this 
illustration was 1.0 and the acceptance range was 0.3. The 6 for this 
analytical derivation of the propensity distribution was set at zero. 
As was pointed out above, the procedure began by administering an item 
of average difficulty from the pool, i.e.,^the item with the difficulty 
parameter 0.0. The probability of a correct response, as determined by 
the probability function given above for the simple logii.tic model, was 
0.5 and the probability of an incorrect response was 0.5. 

After a correct response the ability estimate was increased by the 
stepsize, or after an incorrect response, it was decreased by the stepsize. 
Thus after one item, the ability estimate was either 1.0 with probability 
of 0.5 or -1.0 with a probability of 0.5r. This procedure was followed 
so that finite ^ility estimates would be available after each item response, 
rather than the *+ • value given by the naxi&mm likelihood procedure . 
The expected value of the diitifibution after one item was 0.0 and the 
standard deviation was 1.0. 

Based on these first possible ability estimates the closest items 
wfre selected fron the pool with the restriction that they must have been 
within plus or minus 0.3 of the requested items. Thus, as Figure 1 illus- 
trates, items with parameter estimates of plus and minus 0.75 were adminis- 



ERIC 



5 




5 



t«red to th« •atlnated «toilitie» plus and minu« X.OO respectively* On 
tfte upper branch of the tree, e correct response yielded an ability estimate 
that was agAin increased by the stepsize, since a maximum likelihood estimate 
^ could not be deterrtkined without both a correct and Incorrect' response. 
Now, the ability estimate was 2.0. The probability of a correcl: response 
to the item with the 0.75 difficulty parameter was the same except for the 
change in sign of the item parameters and ability estimates. When the 
item pool distribution being considered was symmetric, the results of the 
analyses were the same except for the change in sign. 

Following the middle branches of the tree, an incorre6t response to 
the item with difficulty 0.75 yielded an ability estimate of 0.375 from 
tlie maximum likelihood technique. The probability of this response was 
0.68 based on the model. When the fixst item was missed and the second 
answered correctly, the probability of the second response was also 0.68. 
By the local independence assumption of the model, the probability of either 
a +2.0 r^stimate was 0.5 X 0.32 » 0.16 while the probability of +0.375 was 
J. 5 X 0.68 » 0.34. In this manner the propensity distribution after two 
Items have been administered could be obtained. As noted at the bottom 
of Figure 1, the expected value was still D.O and the standard deviation 
(which was determined as the square root of the VAR($)i was 1.174. 

The tree developed further in this same manner whenever items within 
the acceptance range were available. If all correct or incorrect responses 
were present, the fixed stepsize was used to make ability estimates. Once 
d mixture of correct amd incorrect responses were presen;, the maximum 
likelihood ability estimate procedure was used. Note the "branches" of 
Figure 1 were "live" at +2.00 ability estimate but no items existed in 
Uh; pool within +0.3 of the ability estimate +0.375. Therefore, those 
branches terminated. , "~ 

The tree continues to develop by following all "live" paths. . The 
program, is finished after all branches are terminated by the condition 
that no items of appropriate difficulty are available in the pool. One 
may well imagine that as the number of items in the pool gets larger, 
U-m procedure is, practically speaking, bounded by the storage capacity 
of ti\e computer facility and magnitude of one's computer budget. For 
the IBM 370/168 system on which the TREElP program was nin, it w<»s found 
that sixty-one items was the practical upper limit on the number of items 
the pool could contain for any particular run of the various combinations 
of stepsize, acceptance range, and shape of the item difficulty distribution. 

Due to the limitation on size of the item pool which could be investi- 
gated with the TREEIP program, the second con^uter program, SIMIP, was 
developed. This program was adapted from the tailored testing procedure 
based on the Rasch model which was already operational. This particular f 
tailored testing procedure has been described thoroughly elsewhere (Reckase, 
1974) , so only the details pertinent to this research have been described 
here. The' SIMlP program followed only one path for any given 6 in contrast 
to the TREElf . A particular path was selected using Monte Carlo simulation^ 
techni<|ues. It provided for investigation of the properties of bias and 
variance of ability estimation with much larger item pools since the required 
storag*^ and computation were substantially reduced as compared to the 
TREElP program. 

The following values served as input to the. program: the stepsize, 
acceptance range ^ item pool difficulty values, 9, and number of simulated 
tests to be administered by the tailored testing procedure. The procedure 



ERIC 



8 



-6- 



initi«Xly «dninisterttd «n it«ai of ^v^rag* aMln*«« frm the pool of it«in» 
provided, if a correct reapon*^ obtained, an item that was more difficult 
by a stepsize factor was administered, if an incorrect response was obtained, 
an easier item by the saae value was adainistered. This fixed stepsiza 
up and down procedure continued until both a correct and incorrect answer 
had been obtained in the response string. Then the procedure switched 
froa the fixed stepsize procedure to naximxai likelihood ability estimation. 
In both cases, items were selected to maximize the item information. 
Ability estimation was accomplished after each item was administered (pro- 
vided correct and incorreot i^sponses had previously occurred) by the 
maximum-likelihood estimation procedure using an iterative search for the 
mode of t:hfc likelihood distribution. The items administered had to be 
withm plus or minus the acceptance range from the requested item difficulty 
If no Items were available within this range of the estimated ability, 
the procedure stopped. The only other stopping rule was based on a preset 
m^imum nunber of items that was to be administered. 

Items were scored correct or incorrect by the SIMIP program utilizing 
.-n internal random number generator. Piist^ the probability of a correct 
response was coniputed using the formula for the probability function of 
the simple logistic model stated earlier. The theta for this computation 
was the true theta that was input into the program, and the difficulty 
parameter, b,' was that of the item just administered to the simulated exam- 
inee. After this probability of a correct response had been determined, 
the random number generator selected a number between zero and one from 
a rectangular distribution. If this raiidomly selected number was less 
than or equal to the probability of a correct response, the item was scored 
correct. If the randomly selected number was greater than the probability 
of a correct response, the item was scored as incorrect. Provided both 
incorrect and correct responses had previously occurred, an ability esti- 
mate was made,, and the next item administered was selected to maximize 
informat.\oi. for , this estimated ability. This procedure continued until 
one of the stopping rules was encountered. 

The major controlling program pararoeteri for both the TREEIP and 
SIMIP were the stepsize and acceptance ramge values. The stepsize para- 
meter controlled how quickly the procedure would move through the item 
pool while the acceptance range parameter specified how discrepant items 
could be from those desired and still' be administered. The acceptance 
range also indirectly determined the number of items from the pool which 
were available for administration. Clearly, the wider was the acceptance 
range, the greater was the number of items that could have been chosen 
for administration. 

The TWSEIP and SIMIP programs used in this study for determining the 
optimal stepsise, acceptance range, item pool size, and item pool distri- 
bution were similar in that both output the mean and standard deviation 
of ability estimated for each true 'theta input* However, thay differed 
in the manner in vAiich the mean and standard daviation were determined. 
While the TREllp pursued all possible paths . through the item pool, jtho 
SIMIP followad only the path that was the rasult of the simulated Intor- 
action of an axaainee with the tailored testing procedure* The mean and 
standi^rd deviation from the TREEIP were actually expected values and square 
roots of variance computed from probabilities arising frdm tha one-parameter 



id 



ERIC 



modt^I ^^<i ^ility estimates arising from the maxiiiiuiii likelihood estliMtion 
technique. The SIMIP l^rogram provided a meaft and standard daviation of 

the ?>ot of ability (estimates obtained for ^ach of the Os specified* 

METHODS 

To investigate tho optimal stepsize, acceptance range, item pool 
size^ and item pool shape, nearly all possible combinations of the follow- 
xny wure input into ihe TREEIP and SIMIP. programs for true abilities -^3, 
-2, -I, 0, 1, 2, and 3, The stepsize values used were .3, .4, .5, .6, 
.b93, .8, *9, 1*0, l.S, 2*0# and 1*0, while acceptance ranges were 
.2, .3, .4, and ,5. Item pool sizes were 9, 13, 25, 31, 61, 72, 180, 
and 181. Item pool shapes investigated were normal, rectangular, birocdal, 
esid skewed, with difficulty values constrained between plus and minus 
threti. Idealized item pools (difficulty values in the above shapes with 
spacing dependent on shape and size of item pool) were constructed and 
used as input to ^he programs, as well as actual item pools (test items 
calibrated and formed into pools with less constraint on having items 
equally spaced along the difficulty scale) ♦ 

The manner in which item pool size effects were investigated using 
simulations was to run the TREEIP and 3IM1P programs on the various sized 
pools mentioned abovo* With the resulting data, plots and projections 
were made to estimate the item pool sizes needed for various accuracies 
of ability estimation. 

The comparisons to determine the optimal combination of independent 
variables were based upon the mean and standard deviation of twenty-five 
simulated administrations of a tailored test to ^ach 9 using the SlMlP; 
whdre for tlie TREEIP program, the comparisons were o f the expected value 
of 0, E(U), djid the standard deviation of 0, /Var(0). Values of these 
dependent variables were compared across program runs using various sized 
item pools holding stepsize and acceptance range constant. They were also 
con^ia^ed from runs using various shapes of item pools, holding size of 
item pool, stepsize, and acceptance range fixed. Additionally, comparisons 
wero made of the dependent variables, first varying stepsize with all 
other variiibles fixed, and then varying the value of the acceptan::e range 
while holding all other variables constant. Since the TREEIP pr ogram 
was considered to yield the most accurate values, i.e* EO) and /Var (8) 
based upon the propensity distribution, another comparison was deemed 
important. Because the SlMlP means and standard deviations were subject 
to sample variation, they were validated against values of the TREElP 
for various runs on the sixty-one^ item pool* Also, the number of estimates 
of the true ability, i^e. the nun^er of tailored tests administered to 
each simulated examinee by the SIMlP program was varied* This was done 
to chec)c whether an appropriate number of administrations had been used. 

RESULTS 

The results of this study were to a great extent drawn from tables^ 
which sunmarixed the results of the TREEIP and SIMIP programs. One issue 
to be invest isfHted was to determine the distribution of item pool diffi- 
culty paraacneters that yielded the least bias and stiindard error of ability 



- 1 



-8- 



estimates across th« rang* of ability from -3 to +3. Another iin>oirt«nt 
question was how large an item pool was naceasary to accomplish the^ goal 
of accurata afciiity attiaatiori. Thirdly, a determination of the pre- 
ferred magnitude of the 8t«paise parasMter was desired. The fourth 
outcome of this study was to decide upon the approximate value of the 
acceptance range program paraineter which would provide ability estimates 
with the least bias and standard error. These were the primary targets 
of tile study. 

Secondary goals of the study included a coaparison of the perfoxtnance 
of actual versus ideal item pools. The contrast of these has been previously 
described in the methods section. Another s«»condary objective was to coo^jare 
the results of the TREEIP and SlHlP prograa-.. In .this regard, two concerns 
were investigated. One pertained to how clo** the SIMIP estimates of the 
HKjans and standard deviations of ^ility were to the E(Q) and VAR(O) deter- 
mined by the TREEIP. The importance of this particular concern related 
to how well the SIMIP analyses on larger ite - pools provided reliable data 
on tl^e primary questions of this study. It enould be recalled that the 
motivation for development of the SIMlP program was to investigate the 
research questions of the study on larger item pools thaai the TREElP program 
would realistically accommodate. The second concern subsumed under comparison 
of the TREElP and SlMli' programs was to decide whether or not twenty-fiv^ 
estimates of each ability by the SIMlP was an adequate number. Several 
analyses were run of the SIMIP pjcograro on various item pools from which 
dita had already been obtained frcan the TREElP. By running the SIMIP 
on these pools and holding all other variables fixed except the number 
of test administrations, data were obtained pertaining to the adequacy 
of the SIMlP estimates of the means and standard deviations. Another 
matter along this same line was investigated with runs of the SIMlP on 
some of the larger pools. This was the question of whether or not twenty 
items as an upper limit of items administered by the tailored test was 
adequate . 

Item Pool Shaj>e 

The TREElP program {propensity distribution technique) was used to 
evaluate the effects of varying the shape of the item pool distribution 
on ability estimation. The rectangular item pools were obtained simply 
by selecting equally spaced items between +3.0 and ~3.o inclusive. The 
normal item pools were constructed such that the items were equally spaced 
m probability. That is, the area between item positions was kept constant 
in the range frtan +3.0 tc -3.0 standard deviation units in the normal 
distribution. This procedure for producing the normally distributed 
pools has the effect of selecting more items around the difficulty value 
of zero and fewer items at the extreaes. A similar procedure was used 
in selecting the l^ea parameters for the biwodal pools as was. used for 
sel« cting the normal pools. The negative half of the pool was centered 
around -.693 and the area under the normal distribution was used to place 
items around this point up to zero and down to -3.0. The same was true 
for the positive h*ilf of the pool. The reason +.693 were chosen as the 
two modes of the nodal distribution was that, prior to the construction 
of a biraodal pool, .693 as a stepsize value had appeared promising. There- 
fore,, after the first item was administered at 0, the stepsize of .693 
would move the ability estimate out to one of the more dense regions of 

,9 



ERIC 



-3- 



th^ pool whether the ex^nee correctly or incorrectly answered the first 
Item. The skewed item pool distribution of item parameters was constructed 
via a similar procedure to that for the normal and bxmodal pools* That is, the 
it^^ms divided the distribution Lnto equal areas. For the skewed pool, 
tabl^js of the Pearson Type III' distribution were used. The pool constructed 
was uositivtily skewed (skewness ^ .5), and it should be noted that in amy 
toiilo included in the report, a skewed distribution always indicates a 
i^iSitiv^i skt,'W. However, the reuults would generalize to negatively skewed 

^ Hf^sults concerning the shape of the item pool distribution may be 
:>*M^n m Tables i-b for differed combinations of values of the other variables. 
Uuwevct, Tablfcs 4 and 6 point out the more general trends of the item distri- 
bucion study. In Table 6 the comparisons of the normal and rectangular 
pools of 25 items is shown for only acceptance r^ges of 0.1 and 0.3 when 
I aired with stepsizes of 0.5 and 0.7 respectively. These values of acceptance 
roj-^ge and stepsize were chosen because they appeared to yield some of the 
i-*.st bias and least variance estimates. Specifically, the acceptance 
ra]\gi» of 0.1 was chosen to check whether the more dense item parameters 
near the middle of the normal distribution would make tlie use of the smaller 
acceptance range desirable. 

As can be seen from Table 6, tha normal distribution appears to be 
HAft-rior to the rectangular item distribution in almost all cases. Except 
tor Uic 0.1 acceptance range data at 0.5 and ) .0 ability levels, either 
the expected values deviate more from 0 or the standard deviations are 
larger, or both. It ij. interesting to not€ that even the estimates at 
ability 0.0 are not as good for the normally distributed pool as for the 
loctangular pool, even though more items are present for estimation of 
abi lity* 

Table 4 shows the expected values and standard deviations from the 
TKKEIP on normal, bimodal, rectangular, and positively skewed pools* 
Each these contained sixty-one items. The stepsize was fixed at 0.693, 
and the acceptance range was held at 0.30 for all runs. Again the rectan« 
guiar pool performed better overall than did the other shapes of item 
'lifficuity distributions. For true abilities zero and one, the standard 
doviation of ability estimates, as well as the bias of tx\^ estimates, was 
smallest for analyses using the rectangular pool. At the ability levels 
uf two and three, the rectangular pool yielded estimates with less bias 
III the expected values but larger standard deviations than the other shaped 
i>oo i s ♦ 

The results obtained from the TREEIP would have been the same for 
uhe negative end of the ability continuum when the pools were symmetric* 
Therefore, only the positive values of ability were run for the normal, 
rectangular and bimodal pools. However, for the skewed pool containing 
sixty-one items, the negative ability values of -1, -^2, and -3 were run 
using the same program paretmeters as were indicated in Table 4* The results 
were as follows. For -1, the E(0) * -1,189 and Sq * 0.836. For -2, the 
E(0) -2,249 and Sq 0.761. For -3, the E{0) = -^2.935 and Sq » 0-577. 
Even if one considered this skewed pool as being better suited for ability 
l^^vels around minus two to minus one, since it contained more items around 
tnat region, it did not perform better than the rectangular pool. 

Overall conclusions about the most preferred iter* distribution were 
that the rectangular pool was most apt to yield tl^e least bias and smallest 
standard error of ability oEtimates across the ability scale* One important 



ERIC 



-10- 



caution suggested^ by some of the results • When setting up Ati item 

pool for use with taiiored testing procedures {especially those having 
a comparable parameter to tl\e acceptance range) , it is important to look 
carefully at the frequency distribution to be assured that no substantial 
gapu exist in any area of the continuum* Otherwise^ one may expect poor 



one should view the estimates of true ability 3»t) as understandably limited, 
m an much as the item pools did not have any items beyond difficulty 
3.U, For best estimation of ability, the pool should have a dense uni-- 
fc^rm distribution of items around the abilicy lovt^l to be estimated, 

I tern Pool Size 

I'he criteria for judging how large an item pool was needed for the 
tailored testing procedurt^ were again th^ bias and standard error of ability 
^^^nimatos. The results of the simulations using both the TREEIP and SIMIP 
programs have been condensed, and the general trend has been illustrated 
in Figure 2. The values of the E(0) and Sq which have been plotted for 
item poolu of size 9, 13, 25, 31, and 61 were obtained from the TP£E1P, 
Each of these pools had a rectangular distribution of item difficulty 
i^aram^oters . The item pools with 72 and 180 items were actual tailored 
testing item pools. The pool of 72 items consisted of item difficulty 
parameters from the calibration of a set of vocabulary items* This pool 
was named VC1PL» fhe pool with iCO items was constructed using item diffi* 
culty parameters resulteinc of calibrations of items covering cne evaluation^ 
techniques portion of an introductory^ measurement and evaluation course. 
This pool was known as BTIPL. The distributions of item difficulty for 
VCIPL and ETIPL were graphed in Figure 3 and Figure 4 respectively* The 
moaj-is and standard deviations of ability estimates cn the SIMIP runs on 
these latter pools have been included in the plots of Figure 2. Each analysis 
represented in this figure had 0 set equal to 1.0, the stepsize fixed 
at J, 69 J, and ^acceptance range equal to C'D. 

The top graph of Figure 2 illustrates that as item pool size reaches 
bi for this peurticular set of analyses, the £{0) is equal to 0^ The bias 
of the aa:>ility estimates is essentially zero. The bottom graph of Figure 
: i>hows tiiat as itejn pool size increases, the standard error becomes less, 
Vftule these plots should be considere^'as rough approximations of the rela- 
tionship between item pool size and ability estimate bias and standard 
vrror, the it4?ication appears to be that witJi a unifoim distribution of 
item difficulty, 0 1, and the program paramett^rs equal to the values 
uuvd here, one could expect very little bias and a standard error of about 
u.X wxth an item pool consisting of around 200 items. More will be presented 
on Item pool size in the discussion section of this paper. 



The results of the study of tha preferred magnitude of the stepsize 
program parameter may be seen in Tables 1,2,3 and 7, Tables 1, 2, and 
3 give the £(0) and Sq from TREElF analyses of 0 0, 1, 2, and 3 using 
item pools of sixe 9^ 13 1 25, 31, and 61. Tables 1, 2, and 3 show results 
for tlie rectangular, normal, and bimodal distributions of item difficulty 
parameters respectively. Table 7 presents thf. results of the SIMlP analyses 
on the ETlPL item pool for 0 - -3, -2, -1, 0, 1, 2, and 3. Negative 0 



tistimation of ability at that 





Stepsize 



li 



ERIC 



-11- 

v*lu«» «r« not Ahown in T«ble« 1, 2, and 3 tlnc^ th« result* of the TRKEIP 
on the pooU used ere the aane es for the poeitive Q velyen except for 
the change of .ign on the 8(0)8. This .hould be expected since the itea 
pool distribution! of itea difficult- are synrntric around sero. The 
acceptance rang^ for ^11 analyses for Tables 1, 2, and 3 was 0.30. For 
the SIMIP analyses of the BTlPL, a substantially larger item pool, a smaller 
acceptance range, 0.25, was used as is noted at the bottom of Table 7 
Another variable recorded in Table 7 is the mean number of items adminis- 
tered for the 25 tests simulated by the SIMIP for each ability level. 
The maximum number of items per simulated test was 20 for these SIMlP 
analyses. 

In general, results presented in Tables 1, 2, and 3 suggest that 
step3i2es between 0.5 and 1.0 give fairly unbiased estimates, and also 
have the smallest standard errors. Larger stepsises tend to have a positive 
bias and larger standard errors. Prom several graphs like the ones presented 
m Figxire 5, the atepsize value of 0.693 appears to be the best overall 
compromise value which achieves less bias while holding the standard error 
down. Figure 5 shows the EO) and Sg for the 31 item rectangular, normal, 
and bimodal pools when 6 - 1.0 and the acceptance range equals 0.30. 
Due to the cost of running the TREElP on larger item pools, not all cells 
of Tables 1, 2, and 3 for the 61 item pools have been analyzed. 

Table 7 of the SIMIP on the ETIPL pool suggests that a steps ize between 
0.4 and 0.7 is probably better for less bias and standard error. It should 
be recalled that the SIMIP is subject to saaple variation, but in general, 
the results saem to suggest that a stepsize of abou*: 0.7 is appropriate. 
However, a trend which should be investigated further is that larger item 
pools seem to do better with smaller stepsizes and vice versa. 

Acceptance Range 

The results of the acceptance range study are given in Tables 8, 9, 
and 10. Table 8 presents the E(9) and Sq for stepsizes 0.5, 0.693, 1.0, 
and 1.5; acceptance ranges 0.1, 0.2, 0.3, and 0.4, and ability levels 
O.C, 1.0, 2.0, and 3.0 from TREElP analyses. All of the results in Table 
8 are based on the 25 item rectangular pool. Prom Table 8 it can be seen 
that in most cases, as U.e acceptance raixge increase.,, the standard devia- 
tion decreases. This is a reasonable result since more items are available 
for administration with a larger acceptance range. However, there is 
also a trend present in the amount of bias in estimate as the acceptance 
range -increases, particularly at the higher ability levels and for the larger 
stepsizes. 

Table 9 shows the results of the SIMlP on the VCIPL pooli 25 test 
administrations per ability level » 20 item upper limit i stepsise - .693} 
and 0 • -3, -2, -1, i, 2, ana 3. The mean number of items is also 
indicated. These results indicate that an acceptance range of 0,30 is 
probably the best conprooiise value for miniaiaing bias and standard error 
of ability estimates across the range of 9. Table 10 shows the results 
of the SIMIP on the ETIPX. pooli 25 test adininii«traUons per ability level, 
40 item upper li»iti stepsi*e - .693, and 8 - -3, -2, -1, o, 1, 2. and 3. 
Again, the nean number of items is indicated. These results on E1\.PI. 
are somewhat more ainbiguous although the extreme Acceptance range values 
are clearly inferior to the more moderate values of .2 to .4. In cases 
such as this, one should consider a combination of the density of the 



ERIC 



it^em pool Across the r«n9« of 8 and whether a p«rticulAr>^ renge should 

bs sstiastsd aors precissly thsn others ^ in order to decide on the best \ 

acceptance range value. Decisions regarding the best value of progratfn 
,^areu»etera cannot be made independent of cons ideg^j^t ions such as the size 
and shape of the item pool to be used« ^ 

Secondary Reaults 

Secondary results include the comparison of the performance of actual 
versus ideal item pools previously discussed. T^le 11 shows this cougar- 
ison, and overall, the ideal pool did not perform much better than the 
HTIPL pool. 

Ajfiiothar comparison was of the SIMIP and TREEIP programs on the saine 
pools using the same pvograsi parameter values. By looking at Table 4 and 
Teible 5 ^ ^^i^ jMty see that the SIMIP did a reasonably good job of approxi- 
mating the TREElP results at 0 « 2 for the bimodal and skewed pools* 
Alao^ from Table 5, it can be seen that increasing the number of tests 
administered by the SIMlP did not dramatically change the means and standard 
deviations* Therefore, 25 administrations seemed adequate % 

Finally, by comparing cells of Tables 7 and 10^ one cari see that 
increasing the maximum number of itf^ms administered from 20 to 40 does 
not substantially change the means eaxd standard deviations from the SIMlP* 
This con^arison is not exact because the acceptance range of 0*25 used 
for analyses in Table 7 does not precisely equal the value of 0*2 or 0,3 
for acceptance range in Table 10. Neither is the stepsise of 0.7 in Table 
7 exactly equal to 0,693^ used in Table 10 ♦ However, th^ values seemed close 
enough to make a comparison, and the result of this coic^arison seemed to 
indicate that 20 items as an upper limit was adequate • Note that the mean 
number of items recorded in both tables illustrated that the procedure 
tipproached the upper limit in the middle range of 0, 

DISCUSSION 

TREE IP 

A possibl explanation for the larger standard deviation given by 
ar^cilyses run on the rectangular pooli at the more extreme values of the 
ai::>ility continuum was suggested by a close look at the development of 
the propensity distribution by the TREElP for the various shaped item 

pools * 

A property of the TREElP and the manner in which it developed the 
propensity distributions was that the standard deviation actually increased 
as more branches or levels resulted from items administered to more and 
more possible ability estimates « This increase of the standard deviation 
of ability estiJ&ates stabilized and converged for the smaller item pools 
as the paths or branches of the ^'tree** terminated* For the larger pools 
{especially the tixty-^ne itea pools) , the standard deviation initially 
increased but. as branches were terminated the standard deviation cane 
down. This pattern of increasing standard deviation of ability estimates 
during the early formulation of the propensity distribution was evident 
for all shapes of the distributions of items in the pools. 

However^ the patterns of convergence to the final standard deviations 
yieltied by the TREElP were different for the various shapes of item pools 




-13- 



?L H ! . i^""^ ' ^' ^ « tendency 

zero ana '"^^^ KfT*^''^* abilities 
zero and one to be larger tor the nonaal and biniodal pools than for the 

o^is t estimates were generally larger for the rectangular 

across moat of the TFEElP analyses. The explanation proposed was that, 
Wif^r^'f ^'r',''"* available for administration to the more extreme 
stInL%H^ i ^ ' ^"^^^ rectangular pool was used, the 

standard deviation of ability estimates was larger since the standard 
error was more accurately estimated. The standard deviations of the 
estimates from the normal and bimodal pools for these true ability levels 
w..re smaller, since paths or branches were often terminated because no 
Items were available within the acceptance range of the estimated abilities. 
In short, when fewer items were in the pool around a particular true ability 
there were fewer paths allowed to develop in the propensity distribution 
due to the stopping rules. Therefore, the standard deviation of ability 
estimates at that particular level war. an underestimate. A logical check 
for this phenomenon was the prediction that when the acceptance range was 
made smaller, the drop in standard deviations for the more extreme ability 
levels would be more pronounced with the normal pool thai, tor the rectangular 
T..ts did appear to be the case. The point is that the smaller standard 
deviations for ability levels two and three yielded by the TREElP when 
normal or bimodal pools were used probably should not be weighted too 
heavily, as the tendency appear? to be somewhat of an artifact of » 
procedure. The values obtained for the rectangular pools may well 
sTKjre representative. 



;iMiP 



t>IMlP was designed to score and administer items in the manner previous!? 
ue.cribed based on the rationale that this approach was a reasonable simula- 
..lon of the behavior of an examinee when interacting with a tailored test 
The pseudo examinee with some specified true ability was presented an item 
of average dirficulty from the pool, because, given we have no prior infor- 
mation about his ability, the best guess of an item appropriate for the 
examinee was one of average difficulty. Scoring of each item using the 
examinee's 0 in the one-para«et«j: formula and then selecting a random 
number from a rectangular distribution between zero and one was deemed 
a reasonable simulation assuming the one-parameter model was correct. 
Clearly, the larger the probability of a correct response was, the greater 
the chance was that the random number generated was less than or equal 
to tne probability specified by the model of a^orrect response. However, 
there was ampltt provision for the reality that -bccasionally an examinee 
with adequate .bility to answer an item correctly will still respond incorrectly 
and vice versa, while the pr<^ability of a correct response was computed 
using the examinee's true theta, item selection procedures used the fixed 
stepsize until correct and incorrect responses were present, and then 
selected item maximizing information for the estimated ability This 
approach rotinded out the simulation of th6 interaction between examinee 
and tailored test with respect to the SIMlP. 



Item Pool Slse 



The metliods ainployed for the investigation of the effects of item 
pool size on the operation of the one-para^aeter maximum likelihood tailored 
testing procedure were simulations^ but theoretical methods have also been 
proposed. Lord (1970) suggested a formula for tiie number of items required 
for a fixed stepsize procedure {selecting items more difficult by the 
stepsize when correct responses were given ajid vice versa) . The formula 

N (1 + n/d) (n - R/2d) (2) 

where is the range of item dif f icultic^s desired, d is the stepsize 
and a submultiple of R, and n is the maximum number of items to be admin- 
istered* For exair^le, if R were plus three to minus three, d were set 
at O.S, and n were twenty, the forrou3.a would give 

119 ^ (1 + 3.0/0.5) (20 - 3.0/(2 X O.b)). (3) 

With this set of va3\^es, 119 items vould be required if the exact item 
requeste'3 were to be available. 

This formula does not directly apply to some tailored testing proce-- 
dures wh\ch use a variable rather than fixed stepsijse. Also, most 
testing procedures allow administration ol slightly discrepant items from 
those requested by the procedure (the acceptance range specifies how 
discrep3uit) * Procedures using a variable stepsixe tend to require more 
Items because I as the procedure converges to an ability estimate ^ the 
stepsize in effect becomes smaller and smaller. Allowing items to be 
administered which differ sli.ghtily from the requested item compensates 
to an extent for the increase in number of items caused by the variable 
stepsize. Another limitation of the formula is that several tailored 
testing procedures administer items until a specified precision is 
reached instead of using a preset maximum number of items as a stoppiiig 
rule. 

Another theoretical method of estim^ting how large an item pool should 
be is to determine the number of items required to reach a specified 
precision of ability estimation, given that equally spaced, perfectly 
discriminating items are available • With these ideal or optimal circum- 
stances, the precision of an ability estimate is equal to the difference 
between adjacent items • Fo*: example, an item pool with seven equally 
spaced items from -3*0 to ^3.0 would classify examinees into categories 
1,0 scale unit apart. The number of item responses required to make the 
classification would be 

K ^ iog^n (3) 

where n is the size of tiie item pool, since 2^ is the number of branches 
in the tree diagram after k items are adiainistered. By specifying the 
precision desired, e, the minimum item pool size can be determined by 
tiie range of ability, divided by plus one* 



-15- 



The minimum number of iteaia adminiBtereff to classify all ability levels 
in the tailored tttatin? situation is 

)t » log.I- + 11 . (5) 

Some results obtained by the application of the formulas based on 
the theoretical xnethod for estimating the number of items needed in a 
lX)ol^ given the precision desired, have been indicated in Table 12 v The 
r^^qxiirements foS: pool size were counted for the range of ability, 
to +3.5, given the desired classification interval size. As has been 
pointed out^ these results are for a rectangular pool of hypothetical 
Items with perfect discrimination and zero guessing probabilities • With 
these restrictions^ the item pool sises shovm must be regarded as lower 
limits. The minimum session length indicates the fewest nun^er of items 
that would have to be administered in order to classify an ability level 
within t.he capabilities of tlie item pool, Thest? also are based on h^o- 
thetically perfect items and item pools, and should be considered as lower 
limits* The values in the column led^elled simulated length are the 
number of items required to reach a best estimate using the most likely 
response pattern siimilated^ All results in this column are based on 
0 « 0.0* 

In some cases the simulated session length is less than the minimux: 
predicted length because of the choice of ability level • Setting the 
stepsize equal to 0.693 tends to keep the process near the middle of the 
item pool, speeding up convergence for abilities near 0,0, If an ability 
of 3.0 had been used^ the session length would have been 6, well over the 
minimum predicted values* Thus^ the minimum session length refers to 
the nunxber of items needed across the ability range, and under specified 
circximstances fever items may be required. 

These results using simulated tests have been con\pared to actuel 
tailored testing convergence plots and found to be fairly good approxi- 
mations (Reckase^ 1976). Ons observation of importance is that^ from 
convergence plots, it can be seen that giving too many easy items causes 
bias in ability estimation. Reckase (1975) has discussed this effect 
xn detail, 

Generalizability 

It should be kept in mind that this report focused primarily on program 
parameters and item pool attributes as they interacted with the one-parameter 
maximum likelihood tailored testing procedure currently in operation for 
this research project. Clearly, the inferences drawn from the results 
should generalize to other tailored testing applications using similar 
conceptual fonmilaticms of operation. In this sense ^ the results of this 
study were intended not as isolated studies of item pool size and shape, 
stepsize magnitude t and value of the acceptance range, but rather intended 
to generalize to fairly concrete statements about the preferred operation 
of a one-paresieter tailored testing procedure^ As was ejected # item pool 
attributes and program parameters interacted to a great extent in the deter- 
mination of the degree of bias and amount of variance in ability estimation. 
The intention in drawing up the numerous tables and figures of this report 
was to illuminate trends of interaction among thes«i variables. These trends. 



ERIC 



16 



-16- 



m larg** p^t, ver« the primairy thrust of this report. Tti^y should be 
helpfxjl in applying tailored testing procedures in which some of the 
variables, such as item pool attributes, have been fixed by practicality. 
In conclusion, this paper was intended as a guide for those setting 
up a tailored testing procedure. The paper does not, by any means, exhaust 
aiJ the inferences that could be drawn from this set of data. The report 
presented here has been an atteit^pt to condense a inore elaborate technical 
report which is presently being developed. One point should be Made in 
closing. This strategy for investigating bias and standard error was 
ii*otivated by the need to detenoine these vijues across the ability con- 
txnuxun, since our efforts were directed toward developing a criterion- 
referenced tailored test. In criterion referenced testing, it is essential 
to know the effects of estimate bias and standard error on decisions made 
at Various points along the ability continuum. 



-17- 



airnbaum, A. Some Xatont trait aodei« and th«ir use in inferring an examinee's 
ability. In F. M. Lord and M. R. Novlck, Statiatical theories of 
mental teat acorea , Reading, MA: Addison-Wesley, 1968. 

Jenaema. C, J, Bayv.»tan tailored testing and the influence of item bank 
charactefiatica. Paper presented at the Invitational Conference on 
Adaptive Testing, Waahiogton, D.C., 1975. 

Loid, F. M. Some teat theory for tailored testing, in W. H. Holtzaan 
Coioputer- aasisted instruction, testing, ^d guidance . New 
Vork : Harper and Row, 1970. 

Ix>rd, P. M. and Novick, M. R. Statistical .theories of mental x.e»t sc ores. 

R&adjLng, MA: Addison-Wesley , 1968'. ' ~" ""^ — / 

Patience, W. M. Description of components in tailored testing. Behavior 
Research Methods and InstruiDantation , 1977, 9(2), 153-157. 

Fine, S. M,, and Weisa, D. j, A cqa^parlson of the fairness of ad^tive 

and convn tional testing stmeqies (Research Report 78-1). Minneapolis: 
Univeirsity of Minnesota, Department of Psychology, Psychomstric Methods 
Program, 1978. 

Reckase, M. D. An interactive computer program for tailored testing based 
on the one-parameter logistic model. Behavior Research M ethods and 
Instrxaoentation , 1974, 6(2), 208-212. 

Reckase, M. D. The effect of item choice on ability estimation when using 
simple lo gistic tailored testing model . ERIC Document Reproduction 
Service No. ED 106 342, 1975. 

Reckase, M. D. The effect of item pool characteristics on the operation 
of a tail ored testing procedure . Paper presented at the spring Meet- 
ing of the Psychometric Society, Murray Hill, New Jersey, 1976. 

Weiss, D. J. Strategies of adaptive ability measuj^ement . Research Report 
74-5, Department of Psychology, University of Minnesota, DecenAier. 
1974. 

Wright, B. D. and Panchapakesan, N. A procedure for san^le-free item 

analysis. Educational and Psychological Measurement , 1969, 29, 23- 
48. 



ERIC ^ 



-18- 



Tabl« 1 

Biqpttctttd Values and Standard Daviations 
froa TREBIP on Rectangular Itaai Pools 
Varying Pool Si«a, Step Slsa and Ability Level 



Ability Level 



Pool Site Step Si«e 0 1 







E(e) 


8 


E(6) 


6 


E(6) 






C 




0.5 


-0.000 


0.645 


0.405 


0.603 


0.709 


0.482 


0.877 


0 335 


9 


0.693 


-0.001 


1.025 


0.756 


1.113 


1.593 


1*217 


2.388 


1.139 




1.0 


-O.OOi 


1.155 


0.821 


•1.213 


1.685 


1.298 


2.548 


1.286 




1.5 


-0.001 


1.182 


0,934 


1.268 


1.966 


1.439 


3.016 ■ 


1,423 




o.s 


-0.001 


0.765 


0.655 


0,937 


1.577 


1.219 


2.599 


1.201 


13 


0.693 


-fl.OOl 


0.976 


0.733 


1.056 


1.587 


1.217 


2.454 


1.168 




1.0 


-0.001 


1.187 


1.037 


1.150 


1.995 


1.085 


2,. 822 


1.005 




1.5 


-0.006 


1.125 


0.899 


1.249 


1.960 


1.463 


3.045 


1,424 




0.25 


-0.001 


0.547 


0.584 


0.809 


1.606 


1.200 


2.783 


1,190 




0.5 


-0.001 


0.736 


0.857 


0.842 


1.933 


1.000 


2.964 


0,809 




0.6 


0.001 


0.744 


0.896 


0.888 


1.986 


1.004 


2.955 


0.788 


25 


0,693 


-0.013 


0.786 


0.910 


0.892 


1.984 


0.980 


2.9^25 


0.763 


0.8 


-0 013 








^ vU^ / 


1 .U4« 


^ nil c 
3*045 


0.845 




0,9 


-0.001 


0.845 


0.996 


0.895 


2.061 


0.972 


2.996 


0.784 




1.0 


-0.001 


0.829 


0.990 


0.901 


2.099 


1.036 


3.135 


0.867 




1.5 


-0.001 


0.972 


1.109 


1.086 


2.318 


1.221 


3.389 


1.040 




1.7 


-0.001 


1.473 


1.329 


1.417 


2.477 


1,116 


3.143 


0.614 




2.0 


-0.001 


1.551 


1.389 


1.553 


2.673 


1.348 


3.535 


0.846 




3.0 


-0.001 


1.555 


1.361 


1,741 


2.863 


1.930 


4.248 


1.750 




0.5 


0.004 


0.726 


0.949 


0.788 


2.022 


0.902 


3.018 


0.725 


31 


0.693 


-0.003 


0.742 


0.973 


0.826 


2.068 


0.907 


2.997 


0.672 




1.0 


-0.003 


0.776 


1.009 


0.866 


2.140 


0.995 


2.183 


0*817 




1.5 


-0.005 


p. 925 


1.116 


1.050- 


2.002 


1.388 


3.382 


1.023 




0.5 


-0.001 


0.598 


0.989 


0.657 


2.116 


0.804 


3.133 


0.593 


61 


0.693 


-0.001 


0.610 


1.008 


0.677 


2.138 


0.805 


3. Ill 


0.566 




1.0 


-0.000 


0*641 


1.039 


0.745 


2.229 


0.915 


3.289 


0.689 




1.5 



















No^^ Acc«pt«nc# Rwi9« » 0.30 



-19 



13 



25 



Table 2 

£lxpectcd Values and Standard DeviaJtlons 
from TRSEIP on Nonral lt«a Pools 
Varying Pool Site, Step Siz% and Ability Level 

Ability Level 



Pool Size Step Siae 





E(0) 




E(e) 


V 


BO) 


^0 


BO) 


y 


0.5 


-0.001 


1.018 


0.848 


0.847 


1,318 


0.491 


1.463 


0.226 


0.^93 


-0.001 


1.098 


0.960 


0.966 


1.601 


0.6S5 


1.898 


0.382 


1.0 


-0.001 


1.269 


0.880 


1.084 


1.641 


0.632 


1.877 


0.334 


1.5 


0.000 , 


1.500 


0.693 


1.330 


1.142 


0.972 


1.358 


0.636 


0.5 


-0 . 001 


JL . w * O 


1 nAO 
X . uo« 




1 

X ♦O^ / 


0.514 


1 .922 


0.237 


0.69i 


-0.001 


1.101 


1.002 


0.942 


1.648 


0.628 


1.932 


0.358 


1.0 


-0.000 


1.273 


1.146 


1.020 


1.760 


0.548 


1.946 


0.231 


1.5 


-0 . 001 


1,439 


1.272 


1.282 


2.219 


1.188 


3.031 


1.258 


0.25 


-0.001 


0.847 


1.110 


0.858 


1.969 


0.576 


2.210 


0.408 


0.5 


-0.001 


0.891 


1.184 


0.837 


2.016 


0.572 


2.359 


0.278 


0.6 


-C.OOl 


0.980 


1.203 


0.847 


1.965 


0.528 


2.263 


0.266 


0.693 


-0.000 


0.956 


1.174 


0.811 


1.871 


0.482 


2.079 


0.227 


0^8 


-0.001 


1.009 


1.234 


0.871 


2.004 


0.539 


2.232 


0.253 


0.9 


-0.001 


1.052 


1.290 


0.964 


2.223 


0.784 


2.818 


0.658 


1.0 


-0.001 


1.055 


1.295 


0.979 


2.263 


0.858 


2.949 


0.820 


1.5 


-0.001 


1.327 


1.384 


1.186 


2.394 


1.070 


3.167 


1.114 


1.7 


-0.001 


1.536 


1.521 


1.363 


2.549 


0.968 


3.047 


0.628 


2.0 


-0.001 


1.738 


1.653 


1.600 


2.845 


1.248 


3.492 


0.884 


3.0 


-0.001 


1.792 


1.627 


1.749 


2.928 


1.814 


4.045 


1.883 


0.5 


-0.000 


0.869 


1.218 


0.805 


2.046 


0.557 


2.385 


0.277 


0.693 


-0.001 


0.964' 


1.268 


0.880 


2.192 


0.734 


2,778 


0.607 


1.0 


-0.001 


1.018 


1.323 


0.951 


2.300 


0.823 


2.969 


0.787 


1,5 


-0.001 


1.301 


1.404 


1.155 


2.410 


1.043 


3.176 


1.092 



31 



0.5 

0.693 -0.000 0.866 1.256 0.873 2.267 0.693 2.840 0.543 

1.0 

1.5 



Note. Acceptance Range * 0.30 



2o 




-20- 



Table 3 

Kxpectftd Values and Standard DaviatiortS 
f roB TKEElP on Binodal It«B Pools 
Varying Pool slga. Stag Sige f>ad Ability Level 



Ability Level 



Pool Size Step Size 







E(9) 




E(9) 


V 






EO) 


V 




0.5 


-0.004 


1.020 


0.231 


0.443 


1.312 


0.495 


1.473 


0.245 


9 


0.693 


-0.004 


1.095 


0.951 


0.968 


1.601 


0.666 


1.903 


0.383 




1.0 


-0.001 


1.264 


1.036 


1.042 


1.639 


0.628 


1.876 


0.331 




1.5 


-0.001 


1.442 


1.216 


1.326 


2.187 


1.252 


3 9 


1.291 




0.5 


-0.001 


1.006 


1 . 009 

* • w V/ .7 


0 903 


1 671 


0 579 


1 917 


b. 275 


13 


0.693 


-0.001 


1.104 


1.001 


C.945 


1.64? 


0.630 


1.932 


0.358 




1.0 


-0.000 


1.267 


1.143 


1.011 


1.754 


0.551 


1.946 


0.238 




1.5 


-0.000 


1.436 


1.274 


1.276 


2.217 


1.181 


3.029 


1.252 




0.25 


-0.000 


0 92fi 












0 421 




0.5 


-0.001 


0.870 


1,152 


0.867 


2.024 


0.594 


2.373 


0.278 




0.6 


-0.001 


0.951 


1.1*73 


0.875 


1.976 


0.536 


2.271 


0.242 




0.693 


-0.001 


0.964 


1.207 


0.933 


2.174 


0.768 


2.774 


0.612 


25 


0.8 


-o.ool 


0.953 


1.183 


0.887 


2.020 


0.':89 


2.335 


0.272 




0.9 


-0.002 


1.025 


1.260 


0.994 


2.246 


0.780 


2.833 


0.631 




1.0 


-0.001 


1.017 


1.257 


1.002 


2.280 


0.860 


2.969 


0.791 




1.5 


-0.001 


1.294 


1.350 


1.192 


2.396 


1.064 


3.176 


1.091 




1.7 


0.002 


1.491 


1.483 


1.362 


2.543 


0.959 


3.047 


0.612 




2.0 


-0.000 


1.717 


1.609 


1.592 


2.831 


1.235 


3.485 


0.871 




3.0 


-0.001 


1.761 


1.601 


1.763 


2.953 


1.803 


4.070 


1.857 




0.5 


-0.000 


0.796 


1.145 


0,816 


2.060 


0.621 


2.476 


0.406 


31 


0.693 


-0.000 


0.924 


1.229 


0.912 


2.218 


0.741 


2.814 


0.585 




1.0 


-0.000 


0.957 


1.262 


0.956 


2.298 


0.832 


3.004 


0.758 




1.5 


-0.002 


0.968 


1.284 


1.049 


2.446 


1.080 


3.338 


1.015 




0.5 


0.006 


0.726 














61 


0.693 


-0.000 


0.857 


1.245 


0.876 


2.281 


0.688 


2.852 


0.525 




1.0 


0.033 


0.867 
















1.5 


0.185 


1.128 















NQte > Acceptance RaA9e » 0*30 



Table 4 

Expected Values and Standard Deviations 
from TREEIP on^Various Shaped Item Pools 











Ability Level 
















Pool Shape 






0 




1 




2 






3 








EO) 




E(e) 


c 


E{e) 






£(6) 




% 




nozmal 


-0. 


000 


0.866 


1.256 


0.873 


2.267 


0 


.693 


2.840 


0 


.543 


1 


binodal 


-0. 


000 


0.857 


1.245 , 


0.876 


2.281 


0 


.688 


2.852 


0 


.525 


*-> 


rectangular 


~0. 


001 


0.610 


1.008 


^0.677 


2.138 


0 


.805 


3.111 


0 


.566 




skewed 


0. 


040 


0.815 


1.282 

« 


'0,858 


2.257 


0 


.670 


2.801 


0 


.561 





gote. All runs were on pools with 61 items with the stepsize and acceptance range 
program parameters set at 0.693 and 0.30 respectively. 



ERIC 



Tabltt 5 
N«an» and Standard Deviatione 
«ro« SIHIP OR a BiKxUl «nd 
Skewed Itaa Pool V«ryin9 
N«n>T of iNitt Adminiatr«Uon« 



Shap^ of Pool 



Kunbar of T««ta 
Adainiatarea 



Biaodal 



Skewed 





^9 


S 

9 


^9 


^9 


25 
SO 
75 


2.207 
2.242 
2.262 


0.627 
0.634 
0.645 


2.193 
2,225 
2.216 


0.622 
0.627 
0.603 



I 

I 



^ty .at at 2.0. Both th. pool. h«l 61 



Table 6 

Coi^aritton of TRSSIP Results fron 
25 Iten Rectangular and Koxmal Item Distributions 



Ability Level 



Acceptance 
Range 


Sten 
Size 


Distribution 
Shape 


0. 


0 




0.5 




1.0 




2.0 




3.0 








E(0) 




E(9) 




EO) 




E(9) 




EO) 




0.1 


0.5 


h 
N 


-0.001 
-0.009 


0.918 
0.951 


0.470 
0.522 


0.927 
0.904 


0.944 
0.980 


0,943 
0.762 


1.893 
1.468 


0.968 
0.426 


2.764 
1.555 


0.884 

0.251 


0.3 


0,7 


R 
N 


-0.013 
-0.000 


0.787 
0.959 


0.430 
0.623 


0.824 
0.922 


0.911 
1.169 


0.893 
0.821 


1,986 
1.877 


0.984 
0.491 


2.933 
2.093 


0.773 
0.231 



24 



Table 7 

Means and Standard Davlatlona 
from SIMIP an BTIPL Itaa tool 

Varying Stapaisa ^ 
— — ^ 

Ability iaval ^ 



Stapalze 




-3 


-2 


-1 


0 


1 


2 


3 


.1 


s© 

Mni* 


-2.886 
0.715 
13.04 


-2.145 
0.728 
15.88 


-0.992 
0.486 
19.24 


-0.050 
0.534 
20.00 


1-135 
0.502 
20.00 


0.627 
19.84 


0.788 
18.40 


*2 


Mni* 


-2.779 
0.491 
12.24 


-2.230 
0.681 
13.96 


-1.132 
0.550 
19.68 


0.129 
0.461 
20.00 


0.952 
0.374 
20.00 


2.009 
0.515 
19.76 


2,972 
C.857 
18.24 


.3 


H 

Se 

Mni* 


-3.157 
0.652 
10.04 


-2.139 
0.645 
14.48 


-1,134 
0.800 
18,56 


0.064 
0.503 
20.00 


1.018 
0.3<S3 
19 92 


2.055 
0.516 
19,56 


3.213 
0.844 
16,08 


.4 


So 

Mni* 


-3.168 
0.611 
9.56 


-2.250 
0.782 
17.04 


.052 
J. 547 
19.24 


0.001 
0.518 
20.00 


1.070 
0.444 

20.00 


1.987 
0.531 
19.48 


2,910 
0.554 
18.12 


.5 


Sq 

Mni* 


-2.762 
0.539 
9.20 


-2,096 
0.619 
14.72. 


-1.122 
0.700 
18.12 


-0.070 
0.539 
20.00 


1.136 
0.562 
20.00 


2.076 
0.548 
19.40 


3.053 
0.718 
16,28 



Note; All runa made with 25 adiP^ niatrationa per ability level, 20 item upper 
limit f and acceptance « .25. 



*Nni « mean number of items adminia tared. 



Table 7 (Cont.) 
Means and Standard Deviations 
f ron snap on BTIPL Ite» Pool 

Vazyin9 Stapsixe 



Ability Level 



Steps Ize 




-3 


-2 


-1 


0 


1 


2 


3 




Mni* 


-3.061 
0.561 
?.80 


-2.175 
0.460 

13.12 


-1.026 
0.573 

18.92 


-0.065 
0.469 
20.00 


1.029 
0.516 
20.00 


1.950 
0.696 
19.16 


2.913 
0.533 
15.92 


.'J 


\) 

Mni* 


-:t.l34 
0.499 

5.92 


-2,?71 
0.790 
11.40 


-1.241 
0.898 
16.96 


0.094 
0.419 
20.00 


0.959 
0.380 
19.84 


2.029 
0.531 
19.20 


3.310 
0.799 
13.28 


1.5 


Mni* 


-3.739 
0.876 
5.32 


-2.501 
0.961 
10.80 


-1.389 
0.910 
18.04 


0.101 
0.598 
20.00 


1.035 
0.792 
19,32 


2.437 
1.118 
16.16 


3.239 
1.010 
12,72 


2.0 


Mni* 


-3,683 
0.514 
4.24 


-2.972 
1.044 
8.76 


-1,482 
1.194 
16.56 


-0 . 329 
1.175 
18.56 


1.100 
0.450 
19.96 


2.032 
0,913 
18.48 


3.631 
1,345 
13,36 


3,0 


Mni* 


-4.530 
1.591 


-2.942 
1.494 
10.68 


-1.751 
1.916 
16.52 


-0.042 
0.465 
20.00 


1.230 
1.117 
19.28 


2.511 
1.556 
17.04 


4.471 
1.519 
8.60 



JS^. Ml runs lude with 25 administrations par ability level, 20 ita» ^pper limit, 
:i>d acceptance range ■ «25# 

^i mean number of items administered 



I 



2' 



ERJC 



V 

26- 



Table 8 

Expected Value* and Standard Deviations 
from TRBEIP on 25 Item Rectangular Pool 
by Step Sise and Acceptance Range 



Ability 
Level 


Acc«|^tance 
Range 


, 


Step Size 












0. 


5 


0. 


693 


1 


.0 


1 

-a. 








ECQ) 




E(e) 




EO) 




E{e) 






.1 


-0.00 


0.92 


-0.00 


0.84 


-0.00 


1.04 


-0.01 


1.07 


0.0 


.2 


-0.00 


0.81 


-0.02 


1.01 


-0.00 


0.88 


-0.00 


1.06 


.3 


-0.00 


0.74 


-0.01 


0.79 


-0.00 


0.83 


-0.00 


0.97 




.4 


-0.00 


0.76 


-0.01 


0.78 


-0.00 


0.81 


-0.00 


0.93 




.1 


0.94 


0.94 


0.55 


0.80 


1.08 


1.06 


0.89 


1.23 




.2 


0.89 


0.87 


1.00 


1.04 


1.00 


0.94 


0.90 


1.22 


1.0 


.3 


0.86 


0.84 


0.91 


0.89 


0.99 


0.90 


1.11 


1.09 




.4 


0,94 


0.81 


0.36 


0.83 


1.00 


0.89 


1.10 


1.07 




.1 


1.89 


0.97 


0.97 


0.66 


2.09 


1.03 


1.99 


1.45 




.2 


1.92 


0.99 


1.88 


0.97 


2.08 


1.04 


2.00 


1.45 


2.0 


.3 


1.93 


1.00 


1.98 


0.93 


2.10 


1.04 


2.32 


1,22 




.4 


2.01 


0.92 


2.03 


0.91 


2.12 


1.02 


2.33 


1.21 




.1 


2.76 


0.88 


1.21 


0.46 


2.93 


0.93 


3.09 


1,39 


3.0 


.2 


2.89 


0.8S 


2.74 


0.89 


3.08 


0.91 


3.10 


1.39 


.3 


2.96 


0.81 


2.92 


0.76 


3.14 


0.87 


3.39 


1.04 




.4 


3.00 


0.74 


2.97 


0.72 


3.16 


0.84 


3,42 


1.01 



ll 



Acceptance 
Range 



.1 



.2 



Q 

Mni* 



Mux* 



Mni* 



0. 
Mni* 



Mni* 



-4 



Table 9 

Means and Standard Deviations 
from SIMIP on VCIPL It«i Pool 
Varying Acceptance Range 



Ability Level 



-2 



-1 



-1.938 
0.430 
3,64 


-X.713 
0.794 
5.56 


-0 . 994 
0.573 
6.84 


-0,491 
0.810 
8.52 


-2.747 

0.790 
6.96 


-2.133 
0.520 
8.68 


-1.193 
0.779 
12.56 


-0.152 
0,544 
14.44 


-2,955 
0,823 
7.00 


-2.085 
0.555 
10.00 


-1,311 
0,943 
11.24 


-0.021 
0.385 
18.96 


-3.171 
0.690 
7,08 


-2.404 
0.538 
8.08 


-1 . 346 
0,681 
14.60 


-0.007 
0.344 
18.44 


-3.157 
0.606 
8,16 


-2,242 
0.791 
11.4 


-1.051 
0.619 
17.04 


0.160 
0.755 
18.64 



1.101 
0,976 
11.12 

1,208 
0.739 
15.44 

1.026 
0.578 
17.24 

0.869 
0.399 
19.72 

0.941 
0.546 
19.28 



1.873 
0.676 
9.72 

2.268 
0.686 
10.56 

2.229 
0.581 
12.96 

2.234 
0,775 
14.64 

2.340 
0.780 
14.32 



2.913 
0.447 
6.24 

2.889 
0.540 
7.20 

3.10S 
0.510 
7.68 

2.950 
0.579 
9.64 

3.117 
0.497 
9.48 



Hote. All runs made with 25 administrations per ability level, 20 item 

upper limit, and stepsize » .693. 

*Mni * mean number of items administered 



I 

1 




.V 



Table lo 
Means and Stuidard Deviations 
f roa SXRIP on BTlPti Xt«& Pool 
Varying Acc«pt«nce Range 



Ability Level 



Acceptance 
flange 


-3 


-2 




0 


1 


2 








-2 . 526 
0.559 
6.64 


-2.200 
0.667 
9.24 


-1.174 
0.700 
17.16 


-0.111 
0.569 
26.60 


0.974 
27.80 


2.001 
16.44 


3.299 
u * 9 fix 
11.16 


.2 


e 

Mni* 


-2.989 
0.491 
7.20 


-2.159 
0.559 

At* . -/« 


-1.144 
0.731 


-0.016 
0.332 
31 .60 


0.926 
0.362 
33.60 


2.152 
0.464 
22.36 


3.451 
0.765 
13.40 


.3 


> 

Mni* 


-3.103 
0.576 
7.60 


-2.475 
0.594 
12.40 


-1.162 
0.630 
27.96 


0.003 
0.239 
36.72 


1.016 

o.4or 

37.88 


2.161 
0.410 

25.32 


3.024 
0.747 
18.16 


.4 


4. 

Mni* 


-3.064 
0.615 
10.20 


-2.359 
0.815 
18.00 


-1.121 
0.582 
31.40 


-0.094 
0.261 
39.00 


1.043 
0.316 
39.36 


2.073 
0.336 
31.52 


3.054 
0.520 
20.12 


.5 


mSi* 


-3.378 
0.716 
10.24 


-2.465 
0.715 
18.48 


-1.088 
0.510 
35.08 


0.031 
0.394 
39.80 ' 


0.993 
0.356 
39.48 


1.920 
0.389 
35,12 


3.195 
0.584 
20.76 



I 

K) 
I 



Mote* All runs made with 25 adminiatrationg per ability level, 40 item upper 

limits and stepsixe •693. 

♦Itoi * mean nunber of iteas administered 



ERIC 



Tabid II 

Means and S€iAdard O«viations froa 
SIMIP on ETIPL I ton Pool and Conparable 
Ideal Iten Pool 









Ability Level 














-3 


-2 


-1 


0 


1 


2 


3 


ETIPL 

Pool 




-3.061 
0.561 


-2.175 
0.460 


-1.026 
0.573 


-0.065 
0.469 


1.029 
0.516 


1.950 
0.696 


2.913 
0.533 


Ideal 
Pool 


1 


-3.036 
0.441 


-2.404 
0.703 


-1.037 
0.652 


-0.017 
0.462 


1.148 

0.787 


2.222 
0.718 


3.070 
0.460 



Hote* All rxuis made with 25 administrations per ability levels 20 item upper 
limits stepsize « 0*70^ and acceptance range « 0.25. 



s 
t 



31 



KininuB Ittta Pool Requiremftnts 
for a Kectangular Idealised Pool Given 





Claesification 


Interval and Ability Range 




Ability 


Classification 


Pool 


Mininun 


Simulated 


Range 


Interval Size 


sise 


Session Length 


Length* 


t-3.5, 3.5) 


0.5 


15 


3.9 


2 


(-3.5, 3.5) 


0.25 


29 


4.86 


4 


(-3.5, 3,5) 


0.125 


55 


5.8 


8 


(-3.5, 3.5) 


0.0625 


113 


6.8 


8 


(-3.5, 3.5) 


0.03125 


225 


7.8 


7 



•Note. Number of itea« administered to closest approximation of 9 
value within classification interval. 



-31- 



Item 
Parameters 



\ Figure 1 

Procedural C^ration of TREEIP 
on a Nin« Item Pool with 
Stepsixe • 1.0 and Accept&ncd Bange » 0.3 



Probability 
of Response 



Estimate 

(Item Selected) 



Probability 
of Response 



Estimate 

(Item Selected) 



3,00 
2.25 
1.50 
0.75 
0.00 
-0.75 
-1.50 
-2.25 
-3.00 




EO) 



1.00 + 
(0.75) 



-1.00 
(-0.75) 




0.0 
1,0 



-2.00 
(-2.25) 



0,0 
1.174 



Note. The » indicates that no item was available in the pool within + the 
acceptance range. ~ 



33 



ERIC 



-32- 



1.2 
1.1 
1.0 
0.9 ' 

o.a • 

0.7 - 
0.6 - 
0.5 • 
0.4 
0.3 - 
0.2 ' 
0.1 ■ 



rigar« 2 

Relationship BettMen Itw Pool Sixe 



ftnd th« E(e) and S 



e 




9 13 



25 



31 



— r— 

61 



e - 1.0 



Item Pool Sis« 
Step«lB« « 0.693 



72 180 



Acceptance Ran^e « 0.30 



34 



14 

13 
12 
11 
10 
9 
8 
7 
6 
5- 
4- 
3- 
2- 
1 



Figure 3 
Frequency Distribution 
Qf Difficulty Values t 

72 Item vciPL Pool 



3.33.02.72.42.11,81.51.2 .9 .6 



I 



►3 0 -.3 -.6 - 



RANGE 



or 



ERIC 



20 
19 
18 
17 
16 
15 
14 
13 
12 
11 
ID- 
S' 
8 
7- 
6- 
5' 
4- 
3 
24 



'LL 

M ih 



Pigure 4 
Frequency Distribution 
of Difficulty VaJ.ue«: 
180 Zten £T1PX. Pool 



4 



ERIC 



5.4 3.9 3.6 3.3 3.0 2.7 2.4 2.1 1.8 1.5 1.2 

/ 



,9 .6 .3 0 -.3 
RANGE 



-35- 



Figure S 
Rolationship B«tween Stepsiste 
and the £(9) and 




Key 

rectangular 



— normal 

. — . binodal 

• ♦ • true 0 (top figure) 

0"1.0, pool size«31, acc. range-, 3 



! 

,693 



1.0 



— I — 

1.5 



♦-•I 




■Ay, 



